Cell Magic Tutorial

Interactions with MLDB occurs via a REST API. Interacting with a REST API over HTTP from a Notebook interface can be a little bit laborious if you're using a general-purpose Python library like requests directly, so MLDB comes with a Python library called pymldb to ease the pain.

pymldb does this in three ways:

the %mldb magics: these are Jupyter line- and cell-magic commands which allow you to make raw HTTP calls to MLDB, and also provides some higher-level functions. This tutorial shows you how to use them.
the Python Resource class: this is simple class which wraps the requests library so as to make HTTP calls to the MLDB API more friendly in a Notebook environment. Check out the Resource Wrapper Tutorial for more info on the Resource class.
the Python BatFrame class: this is a class that behaves like the Pandas DataFrame but offloads computation to the server via HTTP calls. Check out the BatFrame Tutorial for more info on the BatFrame.

The `%mldb` Magic System

Basic Magic

We'll start by initializing the %mldb magic system



In [1]:

    
%reload_ext pymldb









    



mldb magic initialized with host as http://localhost

And then we'll ask it for some help



In [2]:

    
%mldb help









    



Usage:

  Line magic functions:

    %mldb help          
                        Print this message
    
    %mldb init <url>    
                        Initialize the plugins for the cell magics.
                        Extension comes pre-initialized with <uri> 
                        set to "http://localhost"
    
    %mldb doc <kind>/<type>    
                        Shows documentation in an iframe. <kind> can
                        be one of "datasets", "functions", "procedures" or
                        "plugins" and <type> can be one of the installed
                        types, e.g. procedures/classifier. NB this will 
                        only work with an MLDB-hosted Notebook for now.

    %mldb query <sql>
                        Run an SQL-like query and return a pandas 
                        DataFrame. Dataset selection is done via the 
                        FROM clause.

    %mldb loadcsv <dataset> <url>
                        Create a dataset with id <dataset> from a CSV
                        hosted at the HTTP url <url>.
                        
    %mldb py <uri> <json args>
                        Run a python script named "main.py" from <uri>
                        and pass in <json args> as arguments.
                        <uri> can be one of:
                          - file://<rest of the uri>: a local directory
                          - gist://<rest of the uri>: a gist
                          - git://<rest of the uri>: a public git repo
                          - http(s)://<rest of the uri>: a file on the web

    %mldb pyplugin <name> <uri>
                        Load a python plugin called <name> from <uri> 
                        by executing its main.py. Any pre-existing plugin
                        called <name> will be deleted first.
                        <uri> can be one of:
                          - file://<rest of the uri>: a local directory
                          - gist://<rest of the uri>: a gist
                          - git://<rest of the uri>: a public git repo
                          - http(s)://<rest of the uri>: a file on the web
                          
    %mldb GET <route>
    %mldb DELETE <route>
                        HTTP GET/DELETE request to <route>. <route> should
                        start with a '/'.
                        
    %mldb GET <route> <json query params>		
                        HTTP GET request to <route>, JSON will be used to 		
                        create query string. <route> should start with a '/'.		
                        
    %mldb PUT <route> <json>
    %mldb POST <route> <json>
                        HTTP PUT/POST request to <route>, <json> will
                        be sent as JSON payload. <route> should start
                        with a '/'.
       
                        
  Cell magic functions:

    %%mldb py <json args>
    <python code>
                        Run a python script in MLDB from the cell body.
    
    %%mldb query
    <sql>
                        Run an SQL-like query from the cell body and return
                        a pandas DataFrame. Dataset selection is done via
                        the FROM clause.
    
    %mldb loadcsv <dataset>
    <csv>
                        Create a dataset with id <dataset> from a CSV
                        in the cell body.
                        
    %%mldb GET <route>
    <json query params>
                        HTTP GET request to <route>, cell body will be
                        parsed as JSON and used to create query string.
                        <route> should start with a '/'.
                        
    %%mldb PUT <route>
    <json>
    %%mldb POST <route>
    <json>
                        HTTP PUT/POST request to <route>, cell body will
                        be sent as JSON payload. <route> should start
                        with a '/'.

The most basic way in which the %mldb magic can help us with MLDB's REST API is by allowing us to type natural-feeling REST commands, like this one, which will list all of the available dataset types:



In [3]:

    
%mldb GET /v1/types/datasets









    Out[3]:




GET http://localhost/v1/types/datasets
200 OK
 [
  "beh", 
  "beh.binary", 
  "beh.live", 
  "beh.mutable", 
  "beh.ranged", 
  "embedding", 
  "merged", 
  "sqliteSparse", 
  "transposed"
]

You can use similar syntax to run PUT, POST and DELETE queries as well.

Advanced Magic

The %mldb magic system also includes syntax to do more advanced operations like loading and querying data. Let's load the dataset from the Predicting Titanic Survival demo with a single command (after deleting it first if it's already loaded):



In [4]:

    
%mldb DELETE /v1/datasets/titanic
%mldb loadcsv titanic https://raw.githubusercontent.com/datacratic/mldb-pytanic-plugin/master/titanic_train.csv









    



Success!

And now let's run an SQL query on it:



In [5]:

    
%mldb query select * from titanic limit 5









    Out[5]:






  
    
      
      Age
      Cabin
      Embarked
      Fare
      Name
      Parch
      PassengerId
      Pclass
      Sex
      SibSp
      Ticket
      label
    
    
      _rowName
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0
      22
      
      S
      7.25
      BraundMr.OwenHarris
      0
      1
      3
      male
      1
      A/521171
      0
    
    
      97
      23
      D10D12
      C
      63.3583
      GreenfieldMr.WilliamBertram
      1
      98
      1
      male
      0
      PC17759
      1
    
    
      273
      37
      C118
      C
      29.7
      NatschMr.CharlesH
      1
      274
      1
      male
      0
      PC17596
      0
    
    
      524
      
      
      C
      7.2292
      KassemMr.Fared
      0
      525
      3
      male
      0
      2700
      0
    
    
      278
      7
      
      Q
      29.125
      RiceMaster.Eric
      1
      279
      3
      male
      4
      382652
      0

We can get the results out as a Pandas DataFrame just as easily:



In [6]:

    
df = %mldb query select * from titanic
type(df)









    Out[6]:





pandas.core.frame.DataFrame

Server-Side Python Magic

Python code which is executed in a normal Notebook cell runs within the Notebook Python interpreter. MLDB supports the sending of Python scripts via HTTP for execution within its own in-process Python interpreter. Server-side python code gets access to a high-performance version of the REST API which bypasses HTTP, via an mldb.perform() function.

There's an %mldb magic command for running server-side Python code, from the comfort of your Notebook:



In [7]:

    
%%mldb py

# this code will run on the server!
print mldb.perform("GET", "/v1/types/datasets", [], {})["response"]









    



["beh","beh.binary","beh.live","beh.mutable","beh.ranged","embedding","merged","sqliteSparse","transposed"]

Putting it all together

Now that you've seen the basics, check out the Mapping Reddit demo to see how to use the %mldb magic system to do machine learning with MLDB.



In [ ]: